Scaling detection in time series: diffusion entropy analysis 
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The methods currently used to determine the scaling exponent of a complex dynamic process 
described by a time series are based on the numerical evaluation of variance. This means that all 
of them can be safely applied only to the case where ordinary statistical properties hold true even 
if strange kinetics are involved. We illustrate a method of statistical analysis based on the Shannon 
entropy of the diffusion process generated by the time series, called Diffusion Entropy Analysis 
(DEA). We adopt artificial Gauss and Levy time series, as prototypes of ordinary and anomalus 
statistics, respectively, and we analyse them with the DEA and four ordinary methods of analysis, 
some of which are very popular. We show that the DEA determines the correct scaling exponent 
even when the statistical properties, as well as the dynamic properties, are anomalous. The other 
four methods produce correct results in the Gauss case but fail to detect the correct scaling in the 
case of Levy statistics. 
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I. INTRODUCTION 

Scale invariancc has been found to hold empirically for 
a number of complex systems and the correct evaluation 
of the scaling exponents is of fundamental importance 
to assess if universality classes exist Q. The mathe- 
matical definition of scaling is as follows. The function 
r 2, ■ ■ ■) is termed scaling invariant, if it fulfills the 
property: 



$(ri,r 2 ,...) =7° $(7Vi,7S,...) 



(1) 



Eq. (0) means that if we scale all coordinates {r} by 
means of an appropriate choice of the exponents a,b,c...., 
then we always recover the same scaling function. The 
theoretical and experimental search for the correct scal- 
ing exponents is intimately related to the discovery of 
deviations from ordinary statistical mechanics. This as- 
pect emerges clearly, for instance, from Ref. §. The 
author of this interesting book, with the help of dimen- 
sional analysis and regularity assumption, determines the 
values of the scaling exponents. These scaling exponents, 
however, are trivial in the sense that they refer to ordi- 
nary statistical mechanics. 

In this paper we focus on the scaling of time series, and 
consequently || on the scaling of diffusion processes. In 
fact, according to the prescription of Ref. || we interpret 
the numbers of a time series as generating diffusion fluc- 
tuations, thereby shifting our attention from the time se- 
ries to the probability distribution function (pdf) p{x,t), 
where x denotes the variable collecting the fluctuations. 
In this case, if the time series is stationary, the scaling 
property of Eq. (Ill) takes the form 



p(x,t) 



F 



(?)■ 



(2) 



where 5 is the scaling exponent. The ordinary value 
of the scaling exponent is 5 = 1/2. Ordinary statistical 
mechanics is intimately related to the Central Limit The- 
orem (CLT) jjj, thereby implying for for the function F 
of Eq. (J2J) the Gaussian form. 

The main purpose of this paper is to prove that a 
technique of statistical analysis, recently introduced to 
establish the thermodynamic nature of a time series of 
sociological interest ||, affords a reliable way to eval- 
uate the scaling exponent. This method of analysis is 
based on the entropy of the diffusion process and for this 
reason is called Diffusion Entropy Analysis (DEA). We 
compare the DEA to the Standard Deviation Analysis 
(SDA) H, t0 the Detrended Fluctuation Analysis (DFA) 
||, to the Rescaled Range Analysis (RRA) (j, to the 
Spectral Wavelet Analysis (SWA) ||, and we show that, 
while all these techniques, some of which are very popu- 
lar, can yield wrong scaling exponents, the DEA always 
determines the correct value, with satisfactory precision. 
This important conclusion is reached by examining artifi- 
cial sequences generating both Gauss and Levy statistics. 
The DEA is the only technique yielding the correct scal- 
ing in both cases. The other techniques produce correct 
results only in the Gauss case but fail to detect the cor- 
rect scaling in the case of Levy statistics. 



II. DEA 

It is remarkably simple to determine the scaling pa- 
rameter 6 using DEA. First of all, we transform the time 
series into a diffusion process with a given pdf p(x,t) 
(Section 4 illustrates an algorithm to do that). Then, we 
measure the Shannon entropy 



1 



/+oo 
dx p(x, t) In [p(x, t)) . 
-oo 



(3) 



Let us suppose that p(x, t) fits the scaling condition of 
Eq.(||) and let us plug Eq.(||) into Eq.(||). After an easy 
algebra, based on changing the integration variable from 
x into y — x/t s , we obtain 



where 



and 



.4 



S(t) =A + 6t, 



dyF{y) \n[F(y)}, 



(4) 



(5) 



r = ln(t). (6) 

It is evident that the origin of this logarithmic time, 
r = 0, is related to t — 1, and this, in turn, depends 
on the time unit adopted. However, this arbitrary choice 
in no way affects the scaling parameter, which is given 
by the slope of the straight line S(t) of Eq. (||). The 
adoption of a different time unit would change a given 
straigh line into a new one, parallel to the original, and 
thus bearing the same scaling parameter S. 

III. GAUSS AND LEVY DIFFUSION 

The rigorous definition of diffusion scaling is that of 
Eq. (Q) ■ Frequently, the scaling property is also expressed 
by means of 



x oc t° 



(7) 



Let us see now why the scaling paramatcr 5 is often eval- 
uated through the second moment of the pdf p(x, t). In 
the long-time limit the variable x{t) collecting the fluc- 
tuations £ (t) has a time evolution equivalent to 



x (t) = m , 



By time integration we get 



x(t) = x(0) + / dt' f (*') 



(8) 



(9) 



Let us imagine a set of infinitely many trajectories of 
the type of that of Eq. (||). As to the second moment 

< x 2 (t) >, we evaluate its time evolution by squaring 
Eq.(^|) and averaging over all the trajectories of this set. 
Under the assumption that the process is stationary and 
that < £(t) >= 0, it is straightforward to obtain 

< x 2 (t) >=< x 2 (0) > +2 < i 2 > f dt x f 1 dt 2 <5> s (\t 2 \). 

Jo Jo 

(10) 



Note that to get this result we use the equilibrium corre- 
lation function 



< £(*l)g(*2) > 

<e> ' 



(ii) 



Under the stationary condition, this correlation function 
depends only on the time difference, namely, $5(^1^2) = 
^{(l^i — ^2 1), and this property, with the help of a suitable 
change of integration variables, yields Eq. (|i"o|) . 

What is the connection between second moment and 
scaling? Having in mind Eq.(Q) one would be tempted 
to make the conjecture that 



< x 2 {t) >oc t 



26 



(12) 



However, this conjecture is not correct in general, and to 
be more rigorous let us replace Eq.([l2"|) with 



< x 2 (t) >oc t 



2H 



(13) 



The adoption of the symbol H rather than the symbol 
S is dictated by two main reasons. First of all, H is the 
symbol frequently used in literature to denote the scaling 
parameter. In fact, the pioneer work of Mandelbrot, in 
the special case of Fractional Brownian Motion (FBM) 
H , identified the Hurst coefficient H with the scaling 
parameter. This is certainly correct, but the validity of 
this conclusion is confined to the Gaussian case. In gen- 
eral, as we shall see, the asymptotic behavior of Eq.jT^) 
does not imply that 5 — H. This is the second rea- 
son why we prefer to use Eq.(|l^) rather than Eq. (fl2"|) . 
In conclusion, we denote by 8 a parameter implying the 
property of Eq. (^) and by H the value corresponding to 
the property of Eq.jT^). We shall see that even when 
H =/= 5, the parameter H might have a physical meaning 
of some interest, even if it is not the real scaling. 

To prove under which conditions the equality S = H 
applies, and consequently Eq. (p^), as well as Eq. (p^[) , 
is correct, let us notice that under the assumption that 
the fluctuation £(i) is Gaussian, and with no other as- 
sumption, we can prove that the pdf p(x, t) fulfills the 
following diffusion equation 



dp(x, t) 
dt 



d 2 



where 



D(t) =< S, 2 > / §z(t')dt' 



(14) 



(15) 



The proof of this important result rests on the cumulant 
theory of Ref. |l(| and the interested reader can derive 
it from a more general case discussed in Ref. [[y]]. It is 
straightforward to show that the general solution of Eq. 
( |i"4| ) , for a set of particles initially located at x — , is 



p(x,t) 



^Jl-K < X 2 (t) > 



exp 



2 < x 2 {t) > 



(16) 



2 



where < x 2 (t) > is the second moment with the time 



evolution described by Eq.([10|). It is easy to show that 
the time asymptotic properties of the second moment are 
compatible with Eq.(|l3|), with H ranging from to 1, in 
the case where 



lim t ^co^^(t) = sign 



const 



with 



H = 1 



2 



(17) 



(18) 



and < f3 < 1, if sign = 1, and 1 < (3 < 2, if sign = —1. 
It is evident that in this physical condition the property 
of Eq.(j|) applies to the time asymptotic limit, and conse- 
quently in the same time limit the property 5 — H holds 
true. In Section 3A we shall shortly review the FBM, 
which is a matematical idealization fitting the condition 
S = H. In Section 3B we shall illustrate the Levy flight 
diffusion, which is, on the contrary, a mathematical ideal- 
ization incompatible with the existence of a finite second 
moment, thereby implying a striking violation of S = H . 
Finally, in Section 3C we illustrate the Levy walk diffu- 
sion, a compromise between the two earlier conditions, 
in the sense that it is compatible with a finite second 
moment. However, as we shall see, also the Levy walk 
diffusion results in a violation of S = H . 



A. Fractional Brownian Motion 

Fractional Brownian Motion, FBM, is a generalization 
of ordinary Brownian motion introduced by Mandelbrot 
H to describe anomalous diffusion. By definition, the 
FBM of index n is described by the fractional Gaussian 
propagator 



p{x,t) 



1 



exp 



Or- 



(19) 



The variance of this pdf is given by 



B. Levy Flight Diffusion 

Levy Flight Diffusion is described, in the symmetric 
case, by the characteristic function: 



p{k,t) = exp(-K a t \k\ a ) 



(22) 



This type of diffusion is generated by a walker that makes 
jumps with lengths determined by a probability density 
function, A(£), whose asymptotic behavior function has 
the following inverse power law form: 



(23) 



This means |£| ^ a and /i = 1 + a. 

The Fourier inversion of ( p2| ) can be obtained analyti- 
cally by making use of the Fox function Jl^.IkI 



p(x,t) 



1 



1 



U-l *V(M-1) 



H, 



2:2 



_ (1,1/a), (1,1/2) 
" 1} ' (LI), (Ll/2) 



It is evident that this expression fits the scaling definition 
of Eq. (Q), and that consequently, for 1 < fj, < 3, the 
scaling coefficient 5 is 



(24) 



u-l 



It is important to remark that while the CLT yields 
Eq.(||) with 6 = -1 with F being a Gaussian function, the 
Generalized Central Limit Theorem (GCLT) @ yields 
Eq.(||) with i ^ i and F departing from the Gaussian 
form. This departure from Gaussian statistics has the 
striking consequence of making the second moment di- 
verge. In fact, in the asymptotic limit of large x's, we 
get for the distribution p(x, t) the following inverse power 
law form 



lim p(x, t) 

\x\ — >oc 



1 



u < 3 . (25) 



< (x - x) 2 >= 



J dx (x x) p{x, t) 2D f . (20) This makes the second momernt < x 2 > diverge. In the 



We see that FBM is compatible with the asymptotic 
limit of the dynamic approach to diffusion generated by 
Eq. (||) when the fluctuation £(t) is Gaussian. Further- 
more, we see that Eq.(|l9|) fits the scaling condition of 
Eq. and Eq.(p0|) fits the second moment scaling of 
Eq. (|l3|). Thus we conclude that 



5= 1 1 = H 
2 



(21) 



In other words, FBM is a kind of diffusion where the sec- 
ond moment scaling mirrors correctly the scaling prop- 
erty of Eq.(pl). 



more general asymmetric case |14| similar inverse power 
law properties apply to one of the two tails, thereby mak- 
ing the variance < {x — x) 2 > diverge. Consequently, the 
variance method for scaling detection is quite inappropri- 
ate in this case. In the case of real data, the experimental 
observation is done with a finite number of data. This 
means that the variance is finite, thereby giving the im- 
pression that the variance method can be safely adopted 
as a scaling detector. This would lead to misleading re- 
sults, determined by the lack of accurate statistics rather 
than by the genuine statistical properties of the system 
under study. 
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C. A compromise: Levy Walk Diffusion 

Let us now consider another random walk prescription, 
Levy Walk Diffusion |plj|| . This has to do with drawing 
the random numbers t,'s, with the probability density 
V>(r) given by 



^(r) = (/i - 1) 



(T + r)f ' 



(26) 



This means that we can build up an infinite sequence of 
numbers, which are then used to make a random walker 
walk with the following rules. At time t — 0, when the 
first random number, n, is selected, we also toss a coin 
to decide whether the random walker has to move in the 
positive or in the negative direction. The random walker 
walks with a velocity of constant intensity W. Thus, 
tossing a coin serves the purpose of establishing whether 
the velocity of the random walker is W (head) or — W 
(tail) . This condition of uniform motion lasts for a time 
interval of duration n . At the end of this condition of 
uniform motion, a new number, r 2 , is randomly drawn, 
and a new velocity direction is established by another 
coin tossing. The time series {t^} is converted into a time 
series {£j} of either l's or — l's, as follows. We associate 
each Ti to a patch of ni identical symbols, the i-th patch. 
The values of these symbols are fixed to be either l's or 
— l's, according to the coin tossing rule. Then we sew 
the i-th patch to the i — 1-th, on the left, and to the 
i + 1-th, on the right, thereby creating a virtually infinite 
sequence of £i, with values given by either 1 or —1. It 
is evident that the variable x collecting the fluctuations 
£i corresponds to the earlier walking prescription with 
W = 1. The error resulting from the adoption of only 
the integer part of Tj is made irrelevant by the fact that 
the inverse power law nature of Eq. (|26|) with \l < 3 
makes statistically significant the numbers Tj >> 1. It 
is worthe remarking that we shall focus our attention on 
the condition 2 < /z < 3 . 

It is important to stress that the physical condition 
fi > 2 corresponds to the non-vanishing mean time < 
r >, whose explicit expression is: 



< r >= 



T 



(27) 



We denote as event the joint process of random drawing 
of a number and of coin tossing. We consider a time scale 
characterized by the property 



t > T. 



(28) 



It is evident that the number of events that occurred 
prior to a given time t is given by 



t 



< T > 



(29) 



Consequently, at a given time ( »< t >, the position 
x occupied by the random walker is equivalent to the 
superposition of many highly correlated fluctuations £j 
or of n uncorrelated variables rji , whose modulus is equal 
to Wri, with signs randomly assigned by the coin tossing. 
Thus, the probability distribution function, A(t7), is given 
by 



2W v \WJ ' 



(30) 



the analytical form of the function if) being given by 
Eq. ( ^6|) . The variables r\i can be identified with the vari- 
ables £ of Section 3B. As a consequence, the asymptotic 
properties of this probability distribution function are 
the same as those of Eq.(|23|). By applying again the 
GCLT |l4| we obtain the same diffusion process as that 
of Eq.(|22j), and, of course, the same scaling prescription 
as that of Eq.(|J). 

This walk prescription was termed Symmetric Velocity 
Model in Ref. |l5|] and is a form of Levy walk. The Levy 
walk has been originally introduced 1 16 - 1^] as a dynamic 
approach to Levy statistics more realistic than the Levy 
flight of Section 3B. The reason for this conviction is that 
with the Levy walk the walker makes a step of a given 
length in a time proportional to this length, while with 
the Levy flight the walker makes instantaneous jumps of 
arbitrarily length, a property judged, in fact, to be some- 
what unrealistic. In this paper, the Levy walk serves the 
very useful purpose of explaining why the emergence of 
Levy statistics does not imply a total failure of the meth- 
ods relating scaling to variance. In this case, in fact, the 
second moment is finite, and this property does not de- 
pend on the lack of sufficient statistics. It depends on 
the fact that no jump can occur with a length of inten- 
sity larger than Wt. In this specific case the renewal 
theory jl9| prescribes that the correlation function <E> j (t) 
is related to the waiting time distribution ^p{r) by the 
important equation 



l 



< r > 



+ 00 



(*' - t)ip{t')d£. 



(31) 



From this important relation, using Eq.(|26|), we derive 
the following analytical expression for $£(t): 



T 



t + T 



with 



p = n 



(32) 



(33) 



At this stage we are equipped to derive the asymptotic 
properties of the pdf second moment. The existence of 
the correlation function of Eq. ([32]) allows us to use again 
Eq. din) so as to reach quickly the conclusion that 



4 



H = 



4-M 



(34) 



This result is, in fact, obtained by plugging Eq.(|33|) into 
Eq. (|l8|) . There is no reason to identify H with 6, in this 
case. Rather, if we trust the GCLT and, consequently, 
the scaling prescription of Eq.(p4|), we see that 8 is related 
to H by 



5 = 



1 



3-2i? 



(35) 



We shall prove that the DE method detects this correct 
scaling, whereas, of course, the methods resting on vari- 
ance cannot, even if the exponent H they detect has an 
interesting physical meaning. 

It is worth stressing that the physical meaning of H, 
determined by the variance method, changes from case 
to case and depends on both the statistics of the process 
and the walking rule adopted to change the time series 
into a diffusion process. To illustrate this issue, let us 
multiply the numbers Tj by either 1 or —1 by tossing a 
coin. Then, let us denote the resulting numbers by the 
symbol £. We obtain a sequence indistinguishable from 
that discussed in Section 3B. In this case, as we shall 
see, in Section 6B, the resulting value of H = 0.5 reflects 
the fact that the numbers n are uncorrelated and, by no 
means, that the resulting statistics is Gaussian. 



IV. THE DIFFUSION ALGORITHM 



Let us consider a sequence of M numbers 



6, » = 1,...,M. 



(36) 



The purpose of the DEA algorithm is to establish the pos- 
sible existence of a scaling, either normal or anomalous, 
in the most efficient way as possible without altering the 
data with any form of detrending. First of all, let us se- 
lect an integer number I, fitting the condition 1 < I < M. 
This integer number will be referred to by us as "time" . 
For any given time / we can find M — I + 1 sub-sequences 
defined by 



M = t 



ith s = 0, 



, M — I. 



(37) 



For any of these sub-sequences we build up a diffusion 
trajectory, s, defined by the position 



,(«) 



(0 = £d s) = ££ + 



(38) 



Let us imagine this position as that of a sort of Brow- 
nian particle that at regular intervals of time has been 
jumping forward of backward according to the prescrip- 
tion of the corresponding sub-sequence of Eq. ( |37| ) . This 
means that the particle before reaching the position that 



it holds at time I has been making I jumps. The jump 
made at the i-th step has the intensity \Q \ and is for- 

(s) 

ward or backward according to whether the number Q 
is positive or negative. We adopt the paradigm of Brown- 
ian motion for the tutorial purpose of illustrating how the 
diffusion algorithm work. Actually, the ultimate task of 
this algorithm is to express in a quantitative way the de- 
parture of the observed process from the statistical prop- 
erties of ordinary Brownian motion. 

We are now ready to evaluate the entropy of this diffu- 
sion process. To do that we have to partition the x-axis 
into cells of size e(l). When this partition is made, we 
have to label the cells. We count how many particles are 
found in the same cell at a given time We denote this 
number by iVj(Z). Then we use this number to determine 
the probability that a particle can be found in the i-th 
cell at time I, Pi(l), by means of 



Ni{l) 



(M-i + 1) 



(39) 



At this stage the entropy of the diffusion process at the 
time I is determined and reads 



&(0 = -I>(0 HpM- 



(40) 



The easiest way to proceed with the choice of the cell size, 
e(Z), is to assume it to be a fraction of the square root 
of the variance of the fluctuation £(i), and consequently 
independent of /. 



V. THE METHODS OF ANALYSIS BASED ON 
VARIANCE 

In this section we review four methods of time series 
analysis. The last three methods are very popular, and 
are all related, to a somewhat different extent, to the 
first, based on the direct evaluation of variance. 



A. SDA 

The direct evaluation of variance, by means of SDA, 
is probably the most natural method of variance detec- 
tion. This method was used, for instance, in Ref. [||. 
The starting point is given by the diffusion algorithm of 
Section 4, Eq.(|38|). All trajectories start from the origin 
x(l = 0) = 0. With increasing time I, the sub-sequences 
generate a diffusion process. At each time I, it is possible 
to calculate the Standard Deviation (SD) of the position 
of the N — I trajectory with the well known expression: 



SD(l) 



Eti (x(<Hi)-my 

N -l-l 



(41) 
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where x(l) is the average of the positions of the N — I 
sub-trajectories at time I. We note that the prescrip- 
tion illustrated in Section 4 to define the trajectories of 
this diffusion process is based on overlapping windows. 
This means that the trajectories are not totally indepen- 
dent of one another. In principle, we can also adopt a 
non-overlapping window method. In this case the largest 
trajectory that we can build up with the whole sequence 
is divided in M = int(N/l) smaller trajectories of size 
I (with int(x) we denote the integer part of x). Thus, 
the quantity SD(l) can be referred to many trajecto- 
ries totally independent the one from the other. The 
advantage of using many independent trajectories is bal- 
anced, though, by the disadvantage of statistics poorer 
than those obtained by using the overlapping window 
method. Therefore, in this paper we use the method of 
overlapping windows. 

According to the traditional wisdom of the methods 
based on variance, the existence of scaling is assessed by 
observing, with numerical methods, the following prop- 
erties 



SD(l) oc l H . 



(42) 



The exponent H is interpreted as the scaling exponent. 
As discussed in Section 3, there is no guarantee that this 
exponent coincides with the genuine scaling 8. This is 
the reason why with all the methods of analysis of this 
section we shall use the symbol H to denote the result 
of the statistical analysis. To assess whether this is the 
true scaling or not, it is necessary to use also the DEA. 



The reservoir neither overflows nor empties during the 
period of r years if its storage capacity is larger than the 
difference, R(t), between the maximum and minimum 
amounts of water contained in the reservoir. R(t) is 



R(t) = max x(t, t) 

Kt<T 



min x(t, t) . 

Kt<T 



(45) 



For getting a dimensionless value, Hurst divided R(t) 
by the standard deviation S(t) of the data during the r 
years: 



S(r) 



\ 



^£>-<£> T ) 



(46) 



Hurst observed that many phenomena are very well de- 
scribed by the following scaling relation: 



S(t) 



(47) 



The exponent H (denoted by the letter K by Hurst) was 
called Hurst exponent, and consequently denoted by the 
letter H, by Mandelbrot §]. The work of Mandelbrot 
made the RRA become popular as a technique of scaling 
detection. In the case of the Nile river, Hurst measured 
an exponent H = 0.9. This means that the Nile is char- 
acterized by a long range persistence that requires un- 
usually high barriers, such as the Aswan High Dam, to 
contain damage and rein in the floods. This paper proves 
that this perspective, correct in the Gaussian case, is in 
general misleading. 



B. RRA 

The RRA was introduced by Hurst in 1965, in the 
work, Long-Term Storage: An Experimental Study [Q, 
for the main purpose of studying the water storage of 
the Nile river. The problem was to design a reservoir, 
which never overflows or empties, based upon the given 
record of observed discharges from a lake. Let us suppose 
that is the amount of water flowing from a lake to a 
reservoir for each year. The problem is to determine the 
needed capacity of the reservoir under the condition that 
each year the reservoir releases a volume of water equal 
to the mean influx. In r years, the average influx is 

<£>r=~y>. (43) 

The amount of water accumulated in the reservoir in t 
years is 

t 

z(*,r) = > T ) . (44) 

i=l 



C. DFA 

The DFA was introduced in 1994 by the authors of 
Ref. S. Since 1994, hundreds of papers, which analyze 
fractal properties of time series with the DFA, have been 
published. In summary DFA works as follows. Given 
a time sequence {&} (i — 1, N), the DFA is based 
upon the following steps. First, the entire sequence of 
length N is integrated, thereby leading to 

t 

i=l 

where 

1 N 

Second, the time series is divided into int(N/l) non- 
overlapping boxes. The number I, which indicates the 
size of the box, is an integer smaller than N. A local 
trend is defined for each box by fitting the data in the 
box. The linear least-squares fit may be done with a 
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polynomial function of order n > E(J • Let xi (t) be the 
local trend built with boxes of size I. Third, a detrended 
walk is defined as the difference between the original walk 
and the local trend given by the linear least-squares fit 
according to the following relation 



Xi(t)=x(t)-xi(t) 



(50) 



Finally the mean squared displacement of the detrended 
walk is calculated, that is, 



1 - 

^(0 = ^ £[*«(*)] 



(51) 



t=i 



The application of this method of analysis to a number 
of complex systems (see, for instance, Refs. ||^0| ) shows 
that 



F D (n) cx l H . 



(52) 



Again, according to the traditional wisdom of the meth- 
ods based on variance, the exponent H is considered to 
be a scaling exponent. Thus the extent of the departure 
from the ordinary condition of Brownian motion is given 
by |F-0.5| > 0. 



D. SWA 

The SWA is a new and powerful method for studying 
the fractal properties of variance || . SWA decomposes 
the sample variance of a time series on a scale-by- scale 
basis. Wavelet Transform makes use of scaling wavelets 
that have the characteristics of being localized in space 
and in frequencies. The wavelets are characterized by 
the fact that they can be localized in the space and de- 
pend upon a scaling coefficient that gives the width of 
the wavelet. Two typical wavelets widely used are the 
Haar wavelet and the Mexican hat wavelet ||. A width 
coefficient r defines the scale analyzed by the wavelet. 
Given a signal £(u), the Continuous Wavelet Transform 
is defined by 



W(T,t) 



Vv,t(w) du . 



(53) 



The original signal can be recovered from its Continuous 
Wavelet Transform via 



W(T,t) iJ Ttt (u) dt 



%. (54, 



Finally, it is possible to prove that 



du = — 

C 4> 



W 2 (T,t) dt 



dr 

—2=1 Sw(r) dr 



The function W 2 (t, t) / r 2 defines an energy density func- 
tion that decomposes the energy across different scales 
and times. Eq. (|5^) is the wavelet equivalent to the 
Fourier Parseval's theorem. The function Sw{t), defined 
by Eq. (|55|) , is the wavelet spectral density function that 
gives the contribution to the energy at the scale r. 

From Ref . M , we derive that SWA applied to studying 
a noisy seqeuence {&} , at the scale r, yields 



(55) 



S w (t) cx T°. 



(56) 



The exponent a is related to the variance scaling expo- 
nent H in the same way as in the conventional Fourier 
Analysis. Therefore, a = 2H — 1 for the SWA of the 
noise, and a = 2H for the SWA of the integral of the 
noise. The connection with the methods of scaling de- 
tection based on variance is evident. 



VI. ARTIFICIAL SEQUENCE ANALYSIS 

In this section we verify the theoretical predictions of 
the previous sections about the pdf scaling exponent 8 
and the variance scaling exponent, H, by using artificial 
sequences of five million data. With the help of artificial 
time series, we compare the methods of analysis based on 
variance with the DEA. We prove that the DEA always 
determines the true scaling 5, whereas the variance based 
methods detect the true scaling only in the Gaussian case. 
Thus, in the Levy case, only the DEA reveals the true 
scaling. The accuracy of all the numerical points is esti- 
mated to be of about 1%, for all the cases but the short- 
time regime of the highly subdiffusional regime studied 
in Fig.l and Fig. 2. In this case the error is estimated to 
be slighltly larger, of about 2%. To make easier for the 
reader to interpret the figures, we have not reported the 
error bars. 



A. Gauss statistics: Fractional Brownian diffusion 

Fractional Brownian diffusion is produced by Frac- 
tional Brownian noise. We generate a time series {^H,t} 
of N data by using the original algorithm by Mandelbrot, 
illustrated in the book of Feder ^l| . Our choice is moti- 
vated only by historical reasons, and it is beyond the pur- 
pose of this paper to discuss the computational relevance 
of other algorithms, more recently proposed to generate 
FBM Chosen a value of H € [0 : 1], let {0J be a 

set of Gaussian random variables with unit variance and 
zero mean. The discrete fractional Brownian increment 
is given by 




x H (t-l) 



;H-0.5 



h+n(M+t)-i + 
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i(M-l) 

£ 

i=l 



((n + i) 



if-0.5 



:H-0.5\ 



'l+n(M-l+t)-i f j 



where M is an integer that should be moderately large, 
and n indicates the number of the fractional steps for 
each unit time. Note that the time t is discrete. In the 
simulation, good results are obtained with M = 1000 and 
n = 10. The time series {£,H,t} is then used for generating 
a diffusion process with the trajectories (pgj). 

According to the theoretical arguments of Section 3, we 
expect 5 = H in this case. To confirm this expectation by 
means of the statistical analysis we generate five different 
sequences, with the following values of H: (1) H = 0.8, 
(2) H = 0.6, (3) H = 0.5, (4) H = 0.4, (5) H = 0.2. We 
analyze these sequences with the SDA (Fig. 1) and with 
the DEA (Fig. 2). The results of the numerical analysis 
fully confirm our expectation. Let us see all this in more 
detail. For illustration convenience, in Fig. 1 we show 
the normalized standard deviation defined by 



NSD{1) 



SD(l) 



(57) 



where SD(l) is defined by Eq.(^lj). With this choice, 
at I = 1 all the numerical results yield, in the ordi- 
nate axis, the same value, equal to the unity. For the 
same reason, in Fig. 2, we plot the entropy difference 
S(l) — 5(1), thereby making all five numerical curves de- 
part from the same ordinate value at I = 1. In both 
figures the straigh lines are the results of a fitting pro- 
cedure, based on Jsd(1) = l H + Ksd in Fig. 1, and on 
f E (l) = S\n(t)+K E , in Fig. 2. Note that only the levels 
of the straight lines of these two figures, namely the val- 
ues Ksd's and -?Ce's, whose actual value is of no interest, 
are determined by the fitting procedure. The parameters 
H of the straight lines of Fig. 1 are given by the earlier 
mentioned set of values, selected to create the five artifi- 
cial sequences. The parameters S of the straight lines of 
Fig. 2 are the corresponding scaling values, prescribed 
by the theoretical remarks of Section 3, namely S = H 
for each curve. The extremely good agreement between 
theory and numerical experiment, shown in Fig. 1 and 
Fig. 2, proves that, as expected, in the case of FBM, the 
condition H = S really holds true. 

It is remarkable that for almost all the values of H the 
parameters Ksd and Ke are very close to zero. This 
is a consequence of a peculiar property of FBM. In gen- 
eral, the statistical analysis of times series is affected by 
the transition from dynamics to thermopdynamics. The 
short-time regime is a kind of dynamic regime and the 
scaling regime is a kind of thermodynamic regime. It 
takes time to make a transition from the dynamic to the 
thermodynamic regime. This property is made especially 
evident by the analysis of Levy walk [^5) . In the ideal case 
of FBM, however, the transition time is expected to be 
equal to zero. This means that, in the case of ideal FBM, 



the parameters Ksd and Ke should vanish. This con- 
straint is satisfactorily fulfilled by all the curves of Fig. 
1 and Fig. 2, but those corresponding to H = 5 = 0.2. 
This seems to be a consequence of the numerical inaccu- 
racy, enhanced by the highly antipersistent condition of 
this case. 



B. Levy statistics: flight and walk diffusion 

We generate five time series of numbers, {r^}, dis- 
tributed according to the following inverse power law 



iKr) = (M - 1) 



(t + r y 



(58) 



We select the following values of /x: (1) \i = 2.8, (2) 
fx = 2.6, (3) n = 2.5, (4) n = 2.4 and (5) ii = 2.2. 
We also select the numbers Si, equal to 1 or to —1, ran- 
domly according to the coin tossing. We generate the 
Levy flight of Section 3B, by building up the new se- 
quence where £j = s^r-i. As to the Levy walk, we 
derive it, as explained in Section 3C, by sewing together 
into a single sequences many patches. These patches are 
filled with either an integer number of l's or an integer 
number of —1, according to the coin tossing prescription. 
The distribution density of the patch length is the same 
as that of Eq. d5S|). As a consequence, the Levy flight 
and the Levy walk, in the asymptotic time limit have the 
same scaling, given by Eq. (|24|) . However, as explained 
in Section 3C, the Levy walk is expected to result in a 
given H : predicted by Eq.(|34|). In Table I we have re- 
ported for the reader conveniences the values of S and H 
associated to each /i of the five artificial sequences that 
we are considering for our numerical check. 

Fig. 3 shows the DEA at work on the time series gen- 
erating Levy flight. The straight lines are fitting func- 
tions of the form /b(Z) = <51n(t) + Ke- The values of 
the parameters <5, are not fitting parameters, but are de- 
termined by the theoretical prescription of Table I. Figs. 
4 and 5 illustrate the results of the SDA and RRA, re- 
spectively, applied to the same five time series of Fig. 3. 
Both figures yield for H a value independent of /i. This 
value is H = 0.5 in all cases. According, to the tradi- 
tional wisdom, this would suggest the wrong conclusion 
that we are in the presence of ordinary Brownian motion. 
We are not, and the DEA is warning us from making this 
wrong conclusion. The reason for this misleading result 
is that these techniques are determined by both the finite 
value of the variance, due to statistical limitation and the 
memoryless nature of the sequence {ri}. The smaller the 
parameter /x, the smaller the variance, as shown by Fig. 
4. The RRA eliminates this spreading, due to the fact 
that it normalizes the data by dividing by the standard 
deviation. 

Figs. 6-10 refer to the time series generating Levy 
walk. Fig. 6 illustrates the result of the DEA. As in 



8 



Fig. 3 the straight lines are fitting functions of the form 
= 51n(i) + Ke, and, again the parameters 5 are 
those reported in Table I, and are determined by the- 
oretical prescription 5 = l/(/i — 1). Figs. 7, 8, 9 and 
10 illustrate the results of RRA, SDA, DFA and SWA, 
respectively. In these cases the straight lines are fitting 
functions with the form /(/) = Kl . The parameters H 
are not fitting parameters, and their values, reported in 
Table I, are determined by H = (4 — /i)/2. It is remark- 
able that, for all these techniques, the same value of /i 
yields the same value of H, as the fitting curves show. 
This fully supports the conclusion that these seemingly 
different methods of analysis are actually equivalent, and 
that all of them are reliable only in the FBM case. On 
the other hand, we notice that the values of 5 and the val- 
ues of H reported in Table I fit the condition of Eq. j35| ) 
and this a strong evidence that the statistics generated 
by the time series is Levy statistics. This means that the 
disagreement between the DEA and the ordinary tech- 
niques of analysis can be used for the important purpose 
of defining the nature of statistics generated by strange 
kinetics. 

VII. SIGNIFICANCE OF THE RESULTS 
OBTAINED 

This paper affords a compelling evidence that the DEA 
is the only method leading in all conditions to the de- 
tection of the correct scaling exponent 8. In the case 
of a sequence of random numbers, that according to the 
GCLT should result in an anomalous scaling, the popular 
Hurst method would lead to the wrong conclusion that 
the process observed is equivalent to ordinary Brownian 
motion. All the traditional methods would lead to quite 
correct conclusion only in the case of Gaussian statis- 
tics, a condition which does not mean, of course, ordi- 
nary Brownian diffusion, as made evident by the FBM 
theory of Mandelbrot. It is also evident that these tradi- 
tional methods ought not to be abandoned, even if they 
have to be used with caution. The results of Section 
6B prove that the departure of 5 from H is a clear in- 
dication of the occurrence of Levy statistics. More in 
general, the departure of the traditional methods from 
the DEA finding might be used to shed light on statis- 
tics as well as on dynamics. Paraphrasing the title of 
a recent paper p3|, "Do strange kinetics imply unusual 
thermodynamics?" , we can say that one of the basic prob- 
lems concerning complex systems is that of establishing 
if anomalous diffusion (strange kinetics) is compatible or 
not with ordinary Gaussian distribution (ordinary ther- 
modynamics) . The connection between thermodynamics 
and statistical equlibrium does not need any explana- 
tion. Here we are subtly assuming that scaling can be 
perceived as a form of thermodynamic equilibrium. If 
we look to the results of this paper from within this per- 



spective, we can conclude that FBM is an example of 
strange kinetic compatible with ordinary thermodynam- 
ics. We can thus conclude that the joint use of DEA and 
techniques of analysis based on variance can assess when 
strange kinetics force the complex system to depart from 
ordinar thermodynamics. 
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FIG. 1. SDA in action on the five time series of Fractional Brownian noise of Section 6A. In ordinate we plot SD(l)/SD(l) 
as a function of I. The straight lines are fitting functions with the form /sd(£) = l H + Ksd- The values of H are not the 
result of the fitting procedure, but they correspond to the set of values used to create the five artificial sequences of Section 6 
A. From the top to the bottom these values are: (1) H = 0.8; (2) H = 0.6; (3) H = 0.5; (4) H = 0.4; (5) H = 0.2. 
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FIG. 2. DEA of the five time series of Fractional Brownian noise of Section 6A. For illustration convenience, in ordi- 
nate we plot, as a function of I, the entropy increment S(l) — S(l). The straight lines are fitting functions with the form 
/b(0 = 6\n(t) + Ke- The values of H are not the result of the fitting procedure, but they correspond to the set of values 
used to create the five artificial sequences of Section 6 A, according to the theoretical prescription 8 — H . From the top to the 
bottom these values are: (1) 6 = 0.8; (2) 6 = 0.6; (3) S = 0.5; (4) 6 = 0.4; (5) S = 0.2. 
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1 10 100 j 1000 10000 100000 

FIG. 3. DEA of the five Levy flight time series of Section 6B. The straight lines are fitting functions with the form 
/b(0 = 8\n(l) + Kb- The values of 8 are not the result of the fitting procedure, but they correspond to the set of val- 
ues {/x}, used to create the five artificial sequences of Section 6 B, according to the theoretical prescription 5 = — 1). 
From the top to the bottom these values, and the corresponding fitting parameters Ke, are: (1) 8 = 0.833, Ke = 2.25; (2) 
8 = 0.714, K E = 2.15; (3) 8 = 0.667, K E = 2.11; (4) 8 = 0.625, K E = 2.15; (5) 8 = 0.556, K E = 2.15. 
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FIG. 4. SDA of the five Levy flight time series of Section 6B. The straigth lines are fitting functions with the form 
,(/) = K D l H . From the top to the bottom we have: (1) H = 0.5, K D = 190; (2) H = 0.5, K D = 29; (3) H = 0.5, K D = 16; 
H = 0.5, K D = 9.5; (5) H = 0.5, K D = 4.1. 
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FIG. 5. RRA of the five Levy flight time series of Section 6B. All the five cases are fitted by the straight line corresponding 
to the fitting function /s/d{1) = K S /d l H , with K S /d = 11 and H — 0.5. 
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FIG. 6. DEA of the five Levy walk time series of Section 6B. The straight lines are fitting functions with the form 
/e(1) = Sln(l) + Ke- The values of 8 are not the result of the fitting procedure, but they correspond to the set of val- 
ues {fj,}, used to create the five artificial sequences of Section 6B, according to the theoretical prescription 5 — l/(/n — 1). 
From the top to the bottom these values, and the corresponding fitting parameters Ke, are: (1) 8 — 0.833, Ke = 0.35; (2) 
<5 = 0.714, K E = 0.93; (3) S = 0.667, K E = 1.05; (4) S = 0.625, K E = 1.15; (5) 8 = 0.556, K E = 1.25. 
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FIG. 7. RRA of the five Levy walk time series of Section 6B. The straight lines are fitting functions with the form 
/s/d(0 = K S /d l H ■ The values of H are not the result of the fitting procedure, but they correspond to the set of values 
{fi}, used to create the five artificial sequences of Section 6B, according to the theoretical prescription H = (4 — n)/2. From 
the top to the bottom these values, and the corresponding fitting parameters Ksd, are: (1) H = 0.9, Kg/D = 0.15; (2) 
H = 0.8, K s/D = 0.25; (3) H = 0.75, K s/D = 0.75; (4) H = 0.7, K s/D = 0.39; (5) H = 0.6, K s/D = 0.62. 
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FIG. 8. SDA of the five Levy walk time series of Section 6B. The straight lines are fitting functions with the form 
IsdH) = Ksd Z H .The values of H are not the result of the fitting procedure, but they correspond to the set of values {/u,}, 
used to create the five artificial sequences of Section 6B, according to the theoretical prescription H = (4 — /n)/2. From the top 
to the bottom these values, and the corresponding fitting parameters Kd, are(I) H — 0.9, Kd = 0.45; (2) H = 0.8, Kd = 0.48; 
(3) H = 0.75, K D = 0.53; (4) H = 0.7, K D = 0.6; (5) H = 0.6, K D = 0.7. 
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FIG. 9. DFA of the five Levy walk time series of Section 6B. The straight lines are fitting functions with the form 
/f(0 = Kf l H ■ The values of H are not the result of the fitting procedure, but they correspond to the set of values {fi}, used 
to create the five artificial sequences of Section 6B, according to the theoretical prescription H = (4 — fi)/2. From the top to 
the bottom these values, and the corresponding fitting parameters Kf, are: (1) H = 0.9, Kf = 0.043; (2) H — 0.8, Kf = 0.067; 
(3) H = 0.75, K F = 0.0.082; (4) H = 0.7, K F = 0.0.104; (5) H = 0.6, K F = 0.15. 
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FIG. 10. SWA of the five Levy walk time series of Section 6B. The straight lines are fitting functions with the form 
fw (0 = Kw l H ■ The values of H are not the result of the fitting procedure, but they correspond to the set of values {fi}, used 
to create the five artificial sequences of Section 6B, according to the theoretical prescription H = (4 — /x)/2. From the top to 
the bottom these values, and the corresponding fitting parameters Kw, are: (1) H = 0.9, Kw = 0.115; (2) H = 0.8, Kw = 015; 
(3) H = 0.75, K w = 0.17; (4) H = 0.7, K w = 0.2; (5) H = Q£,K W = 0.23. The wavelet spectral density is calculated using 
the Maximun Overlap Discrete Wavelet Transform with the Daubechies H4 discrete wavelet [8] . 
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A* 


H 


.5 


2.2 


0.90 


0.833 


2.4 


0.80 


0.714 


2.5 


0.75 


0.667 


2.6 


0.70 


0.625 


2.8 


0.60 


0.556 



TABLE I. Theoretical relation between the waiting time distribution power exponent fi and the variance scaling exponent 
H and the pdf scaling exponent 8. 



21 



